HIVBackCalc is a tool for the estimation of HIV incidence and undiagnosed cases. The method combines data on the number of diagnoses per quarter with information on the distribution of the time between HIV infection and diagnosis, or TID. These two elements are used to backcalculate the number of incident cases per quarter that must have occurred in order to produce the observed number of diagnoses. The number of undiagnosed cases per quarter are those cases who are estimated to have already been infected but not yet diagnosed in a given quarter. Because TID is not directly observed, the method uses the time between last negative HIV test and diagnosis to approximate TID. Additional reading can be found here HIVBackCalc working paper.
This tool is designed to work with data from the Enhanced HIV/AIDS Reporting System (eHARS) person files. The following SAS program can be used to extract the required data from eHARS person files.
right-click on any white space and use save as to download the script to your computer.
-Note that the SAS script requires that the eHARS data are in the “person” data file structure. Future versions of HIVBackCalc will be able to work directly with the “pretest_questionair” data files and automatically construct the person level data files.
Once the required variables are extracted from eHARS the user will need to modify/construct a few key variables used by the HIVBackCalc tool.
-Note: In future versions of HIVBackCalc the tool will automatically reformat the eHARS data and construct the variables used in the back-calculation that are not directly reported in eHARS.
The tables below provide a list of the eHARS variables needed by HIVBackCalc and a list of new variable names and the variable structures that will need to be created by the user for use by HIVBackCalc.
If using data other than eHARS these are the variable that will need to be provided by the user.
| eHARS name | Description | Type |
|---|---|---|
| hiv_aids_age_yrs | age at diagnosis in years | |
| race | race | categorical |
| trans_categ | mode of HIV transmission | categorical |
| hiv_dx_dt | date of first pos HIV test (lab confirmed) | |
| aids_dx_dt | date of first AIDS classifying condition | |
| screen_last_neg_dt | date of last neg HIV test (lab confirmed) | |
| cd4_first_hiv_dt | date of earliest CD4 | |
| cd4_first_hiv_type | type of CD4 test | count or percent |
| cd4_first_hiv_value | result of CD4 test | |
| vl_first_det_dt | date of earliest detectable viral load | |
| vl_first_det_value | value of detectable viral load | |
| rsh_state_cd | state of residence at HIV diagnosis | |
| tth_ever_neg | TTH: Y/N ever had a neg HIV test | |
| tth_last_neg_dt | TTH: date of last neg HIV test | |
| tth_first_pos_dt | TTH: date of first pos HIV test |
Ultimately, the HIVBackCalc will automatically rename and transform the eHARS data in to the following Variables. Currently the beta version of the application requires that the user format the data using the following variable names and values:
| Variable name | Description | Type | Values |
|---|---|---|---|
| mode | mode of transmission groups | character | TBD |
| hdx_age | age at diagnosis | numeric | |
| yearDx | year of diagnosis | numeric | |
| everHadNegTest | ever had a negative HIV test | character | TRUE FALSE or NA |
| race | race | character | white, Black, Other |
| agecat5 | describes 5-year age groups | character | |
| timeDx | quarter-year of diagnosis | numeric | 1 to 4 |
| infPeriod | time from last negative test to diagnosis in years | numeric |
This section provides a step by step example to illustrate the use and functionality of HIVBackCalc beginning with installing the necessary software and working all the way through importing and analyzing data. Users already familiar with R may skip forward to section 4 of this guide which provides an abbreviated reference guide for using HIVBackCalc. HIVBackCalc is an open source application available for download and use on internal systems. While HIVBackCalc is designed to be an easy to use application with an intuitive GUI interface it depends upon R software, a free software environment for statistical computing and graphics, and several R packages including devtools and shiny. In order to run HIVBackCalc you must have the R software and supporting R packages downloaded on your local machine. The HIVBackCalc App also uses the web browser to provide an easy to use front end. Before trying to run HIVBackCalc please make sure your browser is up to date. Google chrome is recomended. There are know issues with older versions of Internet Explorer.
HIVBackCalc has a minimum R requirement of R version 3.1.2.
To download the R software package click on the link below and then select the appropriate OS.
(R)
After installing R you will need to install Rstudio which is a GUI user interface for R. The Rstudio interface will make running HIVBackCalc easier and assist with the diagnosis of any problems. Download and install Rstudio by clicking the link below.
(Rstudio)
After installing Rstudio several R packages will also need to be downloaded and installed. To download and install the devtools package:
Type or copy the following lines into the R console window, hitting return after each line:
install.packages('devtools')
library(devtools)When installing devtools R will automatically install any additional packages that are required by the devtools package. R will provide output indicating which additional packages are being installed. See the example below:
The HIVBackCalc app also uses a web application tool called Shiny that provides a GUI interface for R based applications. The Shiny package and its dependencies are required to run HIVBackCalc. To download the this software type or copy and paste the command below in to the Rstudio console. This command will install Shiny and its dependencies. As this command is executed R will provide information about all of the dependencies that are also being installed. See the example:
source_url('https://gist.github.com/netterie/65ae953108408a62539d/raw/HIVBackCalc_PackageInstalls.R')
Start Shiny with R by typing the following into the R console window and hitting return:
library(shiny)
Once shiny is running in the R session HIVBackCalc can be run by typing the code below in to the R console window (or cut and paste) and pressing enter:
runGitHub('netterie/HIVBackCalc_App', launch.browser=TRUE)
If an error appears after the above step, e.g. “cannot open URL”, use these alternate instructions.
A. In a browser, navigate to https://github.com/netterie/HIVBackCalc_App
B. Click the “Download ZIP” button in the bottom right corner
C. Rename the file “HIVBackCalc_App.zip” (remove the -master part) and save to a simple location, e.g. C:
D. Unzip the .zip file in that location
E. In R, type the following, hitting “enter” after each line. The syntax below assumes that the folder has been unzipped to C:, so the folder C:_App is on the machine. If a different location is selected for the HIVBackCalc app, replace C: with the appropriate location, using only forward slashes instead of backslashes, as shown below.
setwd('C:/HIVBackCalc_App')
library(shiny)
runApp(, launch.browser=TRUE)
In this example we will use a simulated data set that resembles data from King County Washington. The data is preloaded in to the app but the example data set can also be downloaded using the link below.
right-click on any whitespace and use the save as to download the data to your computer.
From the load data tab on the HIVBackCalc app you can specify the data file format. The default settings specify that the data has headers, is comma delineated and uses double quotes. The default is the correct specification for the example data file.
At this point you can use the “MSM in King County, WA” data that is preloaded or click on the choose file button on the left panel and select the data “data_KC_sim”.
-the first 10 rows of the data will be displayed.
Click on the Examine Data tab at the top of the app.
From the Examine Data tab you can look at an overview of your data which includes the distribution of observations across demographic groups as well as the percentage of data within demographic group with the testing history data needed for the back calculation.
In this example you should see that there are 287 men in the 21-25 year old category, which represents 19% of the total sample, 76% of them have testing history data, 10% do not having testing history data, and for 14% the testing history data is missing.
By returning to the Examine Data tab you can again look at an overview of your data which includes the distribution of observations across demographic groups as well as the percentage of data within demographic group with the testing history data needed for the back calculation. However, the data is now limited to just the selected subgroup. In this case only re4spondes with a race=white.
At this point return to the Load Data screen, select the Optional: Subgroups tab and select “All” from the dropdown menu so that all of the data will be used through the remainder of the example.
After the data have been reviewed the analysis can be executed by selecting the calculate TID tab at the top of the page. The TID are automatically calculated and displayed using three different assumptions. The TID are the basis for the back calculation. The three sets of assumtions use in calculating TID are:
Base Case - Missing testing history data are considered missing at random and are excluded from calculating the TID. The probability of infection is uniformly distributed between the time of last negative test and time of diagnosis.
Worst Case (Observed) - Missing testing history data are considered missing at random and are excluded from calculating the TID. Infection is assumed to occur immediately following the date of last negative test, a worst case assumption.
Worst Case (Missing data) - Missing testing history data are imputed using the assumption that infection occurred either 18 years prior to diagnosis or at age 16, whichever is more recent. For cases with testing history, infection is assumed to occur immediately following the date of last negative test.
Clicking the Calculate TID tab will calcualte the TID and display the estimated distibution of the proportion of HIV cases that remain undiagnosed over time since infection. Because TID is calculated using three different sets of assumptions the plot includes all three alternatives. In this example only about 50% of cases are undiagnosed after 1 year in the base case, but in the worst case scenarios over 75% of cases remain undiagnosed after 1 year. You will also note that at 18 years the undiagnosed fraction drops top zero in all cases. This is due to an assumption in the model that infection can not remain undiagnosed for longer that 18 years.
Click on the Backcalculate infections tab at the top of the app to get to the backcalculation section. Clicking on the run backcalculation button on the left will run the back-calculation. This may take a few moments.
The plot and the table that are generated after the backcalculation is completed provide the same information in different formats. Each provides quarterly diagnosis counts along with estimated quarterly incidence counts and undiagnosed counts. While the plot shows these statistics for each quarter within the data’s time period, the table summarizes them over time. The incidence and the number of undiagnosed cases are generate under the three different set of assumptions for calculating the TID. As you can see in the plot below, in this example the different assumptions had little impact on the estimated incidence but a substantial impact on the estimates of the number of undiagnosed cases.
If the user is familiar with R and has HIVBackCalc installed on their system this section provides a brief overview of the HIVBackCalc functionality.
After HIVBackCalc has opened in the web browser, the panel on the left can be used to choose a data file.
Once the data are loaded a data summary is created and can be viewed by selected the Examine Data tab at the top of the app. Under the Examine Data tab there are three sub-tabs.
Overview - provides the number of observations in each demographic group as well as the fraction of the total sample in each demographic group. The overview also includes the percentage of each subgroup that has testing history data which will be used to calculate the time between infection and diagnosis (TID) which is the basis of the back-calculation.
Diagnosis - Graphical representation of the number of cases diagnosed during each quarter year of the sample time frame.
Testing Histories - This graph provides the percentage of the sample during each water that had a prior negative test, did not have a prior negative test, or had missing testing history data.
After the data have been reviewed the analysis can be executed by selecting the calculate TID tab at the top of the page. The TID are automatically calculated and displayed using three different assumptions. The TID are the basis for the back calculation.
Base Case - Missing testing history data are considered missing at random and are excluded from calculating the TID. The probability of infection is uniformly distributed between the time of last negative test and time of diagnosis.
Worst Case (Observed) - Missing testing history data are considered missing at random and are excluded from calculating the TID. Infection is assumed to occur immediately following the date of last negative test, a worst case assumption.
Worst Case (Missing data) - Missing testing history data are imputed using the assumption that infection occurred either 18 years prior to diagnosis or at age 16, whichever is more recent. For cases with testing history, infection is assumed to occur immediately following the date of last negative test.
Select the Backcalculate infections tab at the top of the screen. Click on the run Infections button on the left. This may take a few minutes to complete.
Once the calculations are complete the HIVBackCalc app will produce a plot and a table. The top half of the plot provides counts of the number of newly diagnosed cases in each quarter as well as the predicted incidence under each of the assumptions TID calculation assumptions mentioned in the previous step. The lower half of the plot provides counts of undiagnosed cases at each quarter under each of the three TID calculation assumptions. The table provides the same information in tabular form.
To close the interface, return to R/RStudio and hit the Esc button.
The HIVBackCalc tool can also be run online using the online Shiny App.To run HIVBackCalc online click on the link below and follow the instructions for running the application or your own computer starting at step 3.4.
-Warning: data uploaded to the online version of HIVBackCalc are not secure!!!